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ABSTRACT 


An  important  component  of  any  automated  image  analysis  system  is  the 
detection  and  classification  of  objects.  In  this  report,  wer  consider  the 
first  of  these  problems  where  the  specific  goal  is  to  detect  anomalous 

areas  (e.g. ,  man-made  objects)  in  textured  backgrounds  such  as  trees, 

fAc-r 

grass,  and  fields  of  aerial  photographs.  Qti r  detection  algorithm  relies 
on  a  significance  test  which  adapts  itself  to  the  changing  background  in 
such  a  way  that  a  constant  false  alarm  rate  is  maintained.  Furthermore, 
this  test  has  a  potentially  practical  Implementation  since  it  can  be  ex¬ 
pressed  in  terms  of  the  residuals  of  an  adaptive  two-dimensional  linear 
predictor.  The  algorithm  is  demonstrated  with  both  synthetic  and  real- 
world  images. 
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1 .  INTRODUCTION 


The  problem  of  detecting  small  regions  of  an  image  which  differ  from 
their  surroundings  is  of  considerable  interest  in  areas  such  as  optical 
aerial  reconnaissance,  radar  and  infrared  image  analysis,  and  medical 
diagnosis  through  imagery.  In  the  application  of  aerial  reconnaissance, 
detection  of  such  "anomalous"  areas  (or  objects)  of  an  image  is  often  the 
first  step  in  image  analysis  systems  which  perform  automatic  classifica¬ 
tion  of  man-made  objects  [  1  ] .  In  this  report,  we  shall  address  the  par¬ 
ticular  problem  of  detecting  objects  in  natural  terrain  (i.e.,  textured 
backgrounds)  such  as  trees,  grass,  and  fields  of  aerial  photographs.  We 
shall  view  an  object  as  an  area  of  an  image  with  different  second-order 
statistical  properties  from  the  surrounding  area  or  background.  Further¬ 
more,  we  assume  that  the  object's  statistics  are  generally  unknown  (it  is 
desired  to  detect  broad  classes  of  objects),  but  that  the  background  sta¬ 
tistics  may  be  known  or  can  be  estimated. 

Usually,  in  detection  theory,  the  object  (or  signal)  is  added  to  the 
background  (or  noise) ,  and  filtering  procedures  are  well-established  for 
increasing  the  signal-to-noise  ratio  {2].  In  image  processing,  however, 
the  object  pixels  replace  the  background  pixels.  Motivated  by  this  obser¬ 
vation  and  the  assumption  that  the  object's  statistics  are  unknown,  we 
decide  on  the  presence  or  absence  of  an  object  through  significance  test¬ 
ing  [3].  In  applying  significance  testing,  we  shall  assume  the  background 
is  characterized  by  a  Gaussian  probability  density  function.  If  a  set  of 
pixels  falls  in  a  critical  region  of  this  density  function  (i.e.,  low  re¬ 
gions  of  probability),  we  reject  the  hypothesis  that  the  pixels  form  part 
of  the  background. 
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To  avoid  estimating  and  Inverting  a  large  covariance  matrix  required 
in  such  a  test,  we  Impose  structure  on  the  background  through  modeling. 
As  a  first  step  toward  this  end,  our  significance  test  is  expressed  in 
terms  of  error  residuals  of  two-dimensional  (2-D)  linear  prediction.  More 
specifically,  the  test,  under  a  Gaussian  assumption,  first  requires  deter¬ 
mining  the  error  residuals  from  optimally  predicting  (in  a  least  squares 
sense)  each  pixel  from  a  linear  combination  of  its  neighboring  pixels. 
The  error  residuals  are  then  summed  over  a  small  area,  suitably  normal¬ 
ized,  and  finally  compared  to  a  threshold.  Since  a  2-D  prediction  filter 
is  associated  with  our  significance  test,  we  can  interpret  this  as  repre¬ 
senting  the  background  by  a  2-D  autoregressive  model  [A] .  The  parameters 
of  this  autoregressive  model  are  therefore  assumed  known  or  estlmatable 
from  the  background.  Furthermore,  when  the  order  of  the  model  is  fixed 
and  small,  an  approximate,  but  a  computationally  practical  implementation 
of  the  test  results.  It  is  interesting  to  note  that  the  linear  prediction 
residual  has  been  used  in  a  number  of  detection  problems  such  as  seismic 
event  detection  [5,6]  and  in  detecting  pitch  in  speech  waveforms  [ 7 J .  In 
the  context  of  image  processing,  the  prediction  residual  has  been  used 
successfully  in,  for  example,  segmenting  textured  images  (4J. 

Since  the  background  characteristics  of  an  image  are  not  stationary, 
i.e.,  are  changing  with  position,  in  order  to  guarantee  a  constant  false 
alarm  rate  (CFAR)  {8]  over  the  entire  image,  we  must  vary  the  threshold 
within  our  significance  test  as  a  function  of  the  pixel  position.  We 
shall  show  that  through  our  prediction  interpretation  of  the  test,  a  sim¬ 
ple  adaptive  thresholding  procedure  arises  which  yields  CFAR  detection. 
The  nonstationarity  of  an  image  also  implies  that  we  must  adaptively  esti- 


mate  the  autoregressive  model  parameters  at  each  pixel.  The  particular 
procedure  employed  Is  based  on  the  2-D  covariance  method  of  linear  predic¬ 
tion  [9]  which  Is  amenable  to  a  recursive  computation.  In  summary  then, 
through  adaptive  estimation  and  thresholding,  our  significance  test  adapts 
itself  to  the  changing  background  statistics  to  guarantee  CFAR  detection. 

In  the  final  pages  of  this  report,  the  algorithm  is  successfully 
demonstrated  through  automatically  detecting  small-extent  objects  in  real 
and  synthetic  images  with  varying  textured  backgrounds.  Our  real  images 
are  extracted  from  aerial  photographs  obtained  from  the  Rome  Air  Force 
Development  Center  (RADC)  data  base.  In  these  examples,  we  explore 
different  approximations  to  the  exact  significance  test.  In  particular, 
first  and  second  quadrant  prediction  filters,  averages  of  such  filters, 
and  noncausal  prediction  filters  are  Investigated. 

2.  SIGNIFICANCE  TESTING 

The  problem  of  object  detection  in  images  Is  viewed  as  one  of  finding 
small  areas  In  an  image  whose  statistical  properties  do  not  match  those  of 
the  surrounding  area  or  background.  Essentially,  we  wish  to  determine 
whether  a  set  of  pixels  under  examination  represent  purely  background  or 
whether  they  contain  partly  or  all  object.  The  area  of  statistics  that 
addresses  such  questions  is  called  significance  testing  [3J.  The  basic 
idea  is  illustrated  in  Figure  1.  A  measurement  is  made  of  some  random 
phenomenon  characterized  by  probability  density  p(x).  A  critical  region  C 
of  low  probability  (a)  is  chosen  corresponding  to  unlikely  events.  If 
measurements  fall  in  the  critical  region,  we  reject  the  hypothesis  that 
the  measurements  really  belong  to  the  density  p(x).  The  significance 
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Fig.  1.  Probability  density  with  critical  region  C. 
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level  of  the  test  Is  determined  by  the  probability  of  events  (a)  in  the 
critical  region.  For  example,  if  this  probability  ot“.05,  then  the  signif¬ 
icance  test  is  said  to  be  at  the  5%  level. 

In  our  case,  the  probability  density  is  that  of  the  background.  If  a 
set  of  measurements  falls  in  the  critical  region,  then  we  reject  the  hy¬ 
pothesis  that  the  pixels  under  consideration  form  a  part  of  the  back¬ 
ground;  that  is,  we  decide  that  they  represent  (at  least  partly)  an  ob¬ 
ject.  More  specifically,  given  an  image  and  small  region  S  at  (n,m),  let 
the  image  points  in  S  be  denoted  by  (see  Figure  2) 


x 


{*1 


(1) 


We  want  to  decide  whether  the  points  in  S  correspond  to  a  background  ran 
dom  field  with  probability  density  p(jc)  (i.e.,  S  contains  just  background) 
or  whether  S  contains  something  other  than  the  background  random  field 
(object  possibly  present).  We  want  to  do  this  for  all  (n,m). 

Thus,  we  must  determine  from  the  background  probability  density  func¬ 
tion  (which  we  assume  is  known  or  estimatable)  a  critical  region  C  of 
small  probability  (a)  which  is  the  level  of  significance.  A  critical  re¬ 
gion  C  can  be  defined  by 

p(x)<A  (2a) 

so  that  the  relation  between  A  and  the  level  of  significance  a  is  given  by 

Jp(jOdjc  *  a  (2b) 

R 


where  the  region  R  contains  only  the  points  jc  which  satisfy  p(jO<A. 

Thus,  in  our  significance  test,  if  (2a)  is  satisfied,  we  decide  "more  than 
just  background";  otherwise,  we  decide  "background  only".  We  repeat  this 
for  every  (n,m). 

Note  that  the  level  of  significance  equals  the  probability  of  false 
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alarm  [2],  i.e.,  the  probability  that  we  say  an  object  Is  present  given 
that  we  have  pure  background.  Ideally,  we  wish  to  make  the  false-alarm 
probability  as  small  as  possible  while  making  the  probability  of  detection 
as  large  as  possible  (i.e.,  the  probability  of  saying  a  target  Is  present 
when  a  target  is  indeed  present).  However,  since  we  assume  no  known  sta¬ 
tistics  about  the  target,  we  cannot  in  this  problem  define  such  a  proba¬ 
bility.  Consequently,  we  are  forced  to  compute  the  probability  of  de¬ 
tection  empirically. 

Let  us  now  suppose  that  the  background  corresponds  to  a  Gaussian  ran¬ 
dom  process  with  mean  m  =  E[x]  and  covariance  K*E t ( x-m) ( x-m) T ] .  Then 

p(x)  =  - - r-pj  exp  [-  -y  (x-m)TK  1  (x-m) )  ]  (3) 

(2ir )  '  K 1 

We  shall  assume  that  the  background  is  generally  nonstationary,  so  that 
the  covariance  matrix  K  has  no  special  structure.  Nevertheless,  we  shall 
assume  for  the  present  that  this  matrix  is  known  or  can  be  estimated. 
Then,  taking  the  logarithm  of  p(jO<X,  from  (3),  our  significance  test 
becomes: 


(x-m)TK  *(x-m)  >  f(K,X) 


(4a) 


with 


f (K,X )=4nl (2w)N Jk| ]-2*nX  (4b) 

where  the  function  f(K,X)  is  considered  a  threshold,  corresponding  to  a 
certain  probability  of  false  alarm. 

In  the  next  section,  we  show  that  this  test  can  be  expressed  in  terms 


of  the  residuals  of  an  adaptive  2-D  linear  predictor 


The  reasons  for 


this  alternative  formulation  of  the  test  are  multi-fold.  Perhaps  the  most 
important  is  to  avoid  the  requirement  o'  estimating  and  inverting  a  large 
covariance  matrix.  For  example,  if  S  is  of  extent  4x4,  then  K  is  of 
extent  16x16.  We  shall  see  that  the  residual  error  interpretation,  along 
with  imposing  an  autoregressive  model  of  the  background,  leads  to  an 
approximate  implementation  of  the  test,  requiring  far  fewer  correlation 
coefficients.  Furthermore,  the  true  test  involves  a  "one-shot"  approach 
to  the  problem;  i.e.,  a  covariance  matrix  is  estimated  and  used  in  the 
thresholding  operation.  The  prediction  approach,  on  the  other  hand,  takes 
apart  the  true  test  into  a  number  of  components.  Such  a  decomposition 
allows  for  both  an  alternative  intuitive  perspective  of  the  algorithm  and 
also  a  means  of  "twiddling"  the  various  components  to  improve  the  test. 
We  shall  also  see  in  section  4  that  this  decomposition  leads  to  a  choice 
of  X  for  guaranteeing  CFAR  detection  with  a  nonstationary  background. 

3.  DETECTION  BASED  ON  LINEAR  PREDICTION  RESIDUALS 

We  now  wish  to  show  that  the  significance  test  of  the  previous  sec¬ 
tion  can  be  expressed  in  terms  of  the  error  residuals  in  optimally  pre¬ 
dicting  each  sample  of  our  small  region  S  by  certain  linear  combinations 
of  samples  within  S.  This  interpretation  leads  to  a  number  of  useful 
approximations  to  the  true  test  when  the  image  background  is  modeled  by  an 
autoregressive  process. 

3.1  The  Relationship  of  Significance  Testing  with  Linear  Prediction 

Our  connection  relies  on  the  fact  that  since  the  background  covari¬ 
ance  matrix  K  is  symmetric  and  positive  definite  it  can  be  uniquely  fac- 
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Cored  in  terms  of  upper  and  lower  triangular  and  diagona'  matrices  [10]. 
In  particular,  we  have 

K.  -  LDLt  (5) 

where  L  is  lower  triangular  with  one's  along  its  diagonal,  where  D  is  a 
diagonal  matrix,  and  where  T  denotes  transpose.  Substituting  (5)  into 
(  4  )  ,  we  have 

(x-m)TK  ^(x-m)  »  (jc-m)  ^ ( LDL^  )  ^  ( 35— m) 

T 

»  (3t-m)T(L  1  )D  1  (  x-®) 

-  eTD_1e  >  f (K,X)  (6a) 

where 

£  »  L  *(x-m)  (6b) 

It  is  straightforward  to  show  that  since  L  is  lower  triangular  with 
unit  diagonal,  L-1  has  the  same  property  and  thus  (6b)  represents  a 
causal  transformation  of  the  vector  £-m  [10].  That  is,  each  e^  is  a 

function  of  x^-m^  for  JKk.  Furthermore,  it  can  be  shown  that  this 
transformation  corresponds  to  successive  orders  of  linear  prediction  where 
the  diagonal  elements  of  D  are  the  prediction  error  variances  [10].  That 
is,  each  row  of  L“*  represents  the  coefficients  required  in  optimally 
predicting  each  element  of  the  zero-mean  vector  x-m  from  a  linear  combina¬ 
tion  of  its  previous  values.  Since  the  covariance  matrix  K  is  that  of  the 
background,  the  coefficients  of  L-*  correspond  to  optimal  prediction  of 
the  background  and  not  object  areas.  This  prediction  concept  is  illus¬ 
trated  in  Figure  3  for  a  zero-mean  process.  More  specifically,  an  element 


9 


124683-N 


10 


ek  °f  .£  in  (6b)  is  given  by 


(7) 


where  a^  is  the  vector  of  coefficients  for  optimally  predicting 
Xk-Ink  fron>  the  previous  values  of  x-m.  The  diagonal  elements  of  D  are 
2 

given  by  -  var  (e^).  Note  that  the  e^'s  are  uncorrelated,  i.e., 

they  form  a  white  process  when  the  pixels  being  predicted  (and  doing  the 
prediction)  are  background  pixels  [10]. 

Returning  now  to  our  significance  test,  we  have  from  (6a), 


T  -1  a  V  ek 

eAD  e  k£1  -  >  f  (K,X ) 

ak  (8) 

2 

where  is  the  prediction  error  variance  associated  with  predicting  a 

background  value  with  its  mean  subtracted.  Thus,  the  significance 
test  involves  first  forming  the  prediction  residuals  e^  over  S  (from 

2 

growing  predictors)  and  normalizing  e^  with  the  corresponding  prediction 
2 

error  variance  o,  .  These  normalized  residuals  are  summed  over  S  and 
k 

then  compared  to  the  threshold  f(K,X). 


3.2  Approximations  Based  on  Image  Modeling 

We  saw  in  the  previous  section  that  samples  used  in  prediction  are 
"causally"  related  to  each  x^  being  predicted.  Furthermore,  we  would 


11 


like  to  say  that  each  is  predicted  from  its  "past"  values  which  fall 
within  S.  The  notion  of  causality  is  thus  based  on  a  definition  of  past. 
For  any  point  (ng.mo),  we  define  the  past  to  be  the  set  of  points 


{(n,m) jn-n0 ,  m<m0;  n<n0 ,  -  «  <  m  £  «}  (9) 

which  are  illustrated  in  Figure  4.  As  a  matter  of  notation,  if  (n^.m^)  is 
in  the  past  of  (n2,m2),  we  denote  this  by  (n^.mj)  <  (n2,m2). 


Fig.  4.  Definition  of  past. 


Let  us  now  suppose  that  each  pixel  of  the  background  Image  Is  known 
to  be  linearly  related  to  its  past.  Specifically,  let's  suppose  that  the 
background  is  generated  by  an  autoregressive  model  of  the  form: 
x(n,m)  “  £  £  a(n,m;j,k)  x(n-j ,m-k)  +  o(n,m)w(n,m) 

(j,k)  >  (0,0)  (10) 

where  w(n,m)  is  white  Gaussian  noise  with  unit  variance  and  where  the 
model  coefficients  and  the  variance  of  the  driving  function  may  vary  in 
space  (i.e.,  we  assume  the  background  is  nonstationary).  We  shall  refer 
to  such  a  model  as  a  nonstationary  causal  autoregressive  model. 

Let  us  further  suppose  that  the  background  follows  a  causal  autore¬ 
gressive  model  of  finite  order.  For  example,  suppose  the  background 
f 0II0W8  a  third  order  model  of  the  form: 

x(n,m)  <*  a(n,ra;  1 ,0)  x(n-l,m)  +  a(n,m;0,l)  x(n,m-l) 

+  a(n,m;l,l)  x(n-l,m-l)  +  o(n,m)w(n,m)  (11) 

Since  the  a(n,m;j,k)'s  and  the  impulse  response  associated  with  this  model 
have  a  lst-quadrant  region  of  support,  we  refer  to  (11)  as  a  l8t-quad- 
rant  model. 

Consider  now  using  in  the  significance  test  of  (8)  a  fixed-order  pre¬ 
dictor  with  the  same  region  of  support  as  the  a(n,m;j,k)'s  in  (11),  rather 
than  a  growing  predictor.  Then,  except  for  the  L-shaped  boundary 
elements,  Illustrated  in  Figure  5,  each  element  of  e^  in  (8)  equals  the 
prediction  error  from  the  growing  predictor.  That  is,  outside  of  the 
boundary,  the  remaining  coefficients  under  the  (growing)  prediction  mask 
of  Figure  4  are  zero,  and  thus  do  not  effect  the  prediction.  Note  that 
with  the  fixed-order  predictor,  elements  outside  of  our  region  S  will  now 
be  used  to  predict  the  boundary  elements. 
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Fig.  5.  Representation  of  a  first-quadrant  space-varying  fixed-order 
predictor. 


More  generally,  the  background  random  field  can  be  well  represented 
by  a  causal  autoregressive  model  of  arbitrarily  large,  but  finite  order 
Ill),  Consequently,  these  exists  generally  a  larger  boundary  region 
(i.e.,  larger  than  that  of  Figure  5)  where  the  fixed-order  prediction 
error  deviates  from  that  of  the  prediction  error  corresponding  to  the 
growing  predictor  of  the  true  significance  test.  It  is  important  to  ob¬ 
serve  that  with  a  fixed-order  predictor,  the  number  of  required  correla¬ 
tion  coefficients  now  depends  on  the  order  of  the  fixed  predictor  and 
not  on  the  size  of  S,  as  in  the  true  significance  test.  We  shall  return 
to  this  issue  in  section  4. 

Finally,  an  approximate  significance  test  can  be  written  as: 


1  *1  /oj  >  fCK.*) 

k-1  *  * 


(12) 


2 

where  e^  and  are  the  prediction  error  and  prediction  error  vari¬ 

ance,  respectively,  associated  with  a  fixed-order  prediction  of  each  ele¬ 
ment  of  S.  Note  that  although  the  predictor  is  fixed  in  order,  it  does 
vary  in  space  over  S  (as  illustrated  in  Figure  5),  since  we  have  assumed 
the  background  to  be  generally  nonstationary. 


3.3  The  Question  of  Directionality 

It  is  curious  that  although  the  significance  test  was  derived  with  no 
imposed  directionality,  the  linear  predictor  which  results  is  causal. 
This  apparent  contradiction  can  be  resolved  by  noting  that  the  causality 
of  the  growing  predictor  in  (6)  arises  only  because  of  our  way  of  ordering 
the  samples  of  S. 
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For  example,  suppose  that  we  order  the  samples  In  reverse  order  as 
Illustrated  in  Figure  6.  With  this  ordering,  the  predictors  become 
"anti-causal".  As  before,  we  can  approximate  these  growing  predictors  by 
fixed-order  predictors. 

More  generally,  we  can  assume  any  ordering  of  the  samples  in  S  (e.g., 
diagonal,  random,  etc.)  Our  choice  irs  an  approximate  significance  test 
(using  a  fixed-order  predictor)  ultimately  depends  on  how  close  the  back¬ 
ground  process  matches  the  assumed  pixel  relationship.  Since  textures  in 
images  appear  to  have  no  directionality,  some  sort  of  noncausal  prediction 
mask  may  be  most  appropriate.  Alternately,  we  may  attempt  to  remove  di¬ 
rectionality  imposed  by  a  fixed-order  causal  predictor  by  averaging  many 
predictors  of  different  directionalities  as,  for  example,  1st,  2nd,  3rd, 
and  4th-quadrant  predictors. 
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Fig.  6. 


"Anti-causal"  prediction  of  x, 


CONSTANT  FALSE  ALARM  RATE  DETECTION 


We  now  consider  determining  X  in  our  threshold  f(K,X)  for  a  constant 
false  alarm  rate.  That  is,  since  the  background  covariance  matrix  K  will 
change  as  our  region  S  sweeps  over  the  image,  we  must  twiddle  X  in  accord¬ 
ance  with  the  changes  in  K  so  that  the  integral  in  (2)  remains  constant. 
In  this  section,  we  derive  this  required  functional  form  of  X.  Our  result 
relies  on  the  orthogonalization  of  the  elements  of  x  through  the  matrix 
decomposition  (5). 

Without  loss  of  generality,  we  shall  assume  a  zero-mean  process. 
Then  from  (2),  (3),  and  (6),  we  have: 


■p(U  -[-4  A-*,]* 


■  /  1 

p(x)<X  (2ir)N/2|K  1/2 


exp 


-  ~  ( L~ 1 x) TD~ 1 ( L~ 1 x )  dx 


(13) 


Now,  let 


£  -  L  1  x  (14) 

so  that,  since  L  *  has  unit  diagonals,  using  the  method  of  Jacobians  [12], 
we  have. 


de  «  dx 


Thus,  substituting  (14)  and  (15)  into  (13),  we  obtain 


/ 

p(Le)<X 


_ 1 _  exp 


[1  T  -1 
-  2  e  D  e  de-o 


(15) 


(16) 
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~l/2 

Furthermore,  we  have  (since  D  is  diagonal): 


J 

p(Le)<X 


(2it)N/2|k|  1/2 


exp 


[■* 


<D-1/2e)T<D-1/2e) 


de 


J 


v  N/ 2 1  | 1/2 


P(ldi/2;xx  (2*>  !k 

where  we  have  used  the  substitutions: 


r  i  xi 

exp  ~  J  11  |D| 


1/2 


de 


(17) 


D‘1/2e 


(18a) 


de  -  I D I  1/2de 


(18b) 


Noting  that  K  -  D  [3] ,  we  have  from  (18b), 


/ 


exp 


(2ir) 


N/2 


r  i  1  / 
-  2  —  e 


1/2“ 

p(LD  e)<X 


(19) 


Observe  that  the  integrand  in  (19)  involves  the  orthogonal  (i.e. 
white)  elements  e^  and  does  not  depend  on  the  statistics  of  jt.  Further 
more,  considering  the  limits  of  integration  in  (19)  we  see  that  the  bound 
ary  of  our  (transformed)  critical  region  is  given  by  the  equation: 

p(LD1/2e  )  -  X 
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which,  from  (2)  and  (6),  can  be  expressed  as 


The  transformation  of  *  to  e  through  L“*  and  the  corresponding 

boundary  of  integration  are  shown  in  Figure  7. 

We  see  then  from  (20)  and  Figure  7  that  to  maintain  a  constant  false 
alarm  rate,  we  must  have: 


in[(2iON/^  J K  j  ^ ^ 2 X ]  "  constant 

or 


(21a) 


(21b) 


Substituting  (21)  into  (6),  we  obtain: 

eT  D-1  e  >  f(K,X) 

-  -in  [(2x)N  |k|]  -2inX 

-  -in  l(2*)N  |k|)  +  in  l(2*)N  (k|)  +inC 


constant 


(22) 


<0)1 


Finally,  then,  our  approximate  significance  test  in  (12)  based  on  a  fixed- 
order  predictor  is  given  by 

N  -2  -2 

J  e,  /a.  >  constant 

k“l  (23) 

Equation  (23)  lends  itself  to  a  simple  intuitively  pleasing  interpre¬ 
tation.  Let's  suppose  that  the  background  follows  our  assumed  fixed- 

order  autoregressive  model.  Then  generating  e^  represents  an  attempt  to 
first  whiten  the  data.  Normalization  by  the  variance  of  the  white  resi¬ 
dual  gives  equal  weight  to  the  generally  nonstationary  image  pixels. 
Finally,  we  perform  a  significance  test  on  sets  of  N  (equally  "important") 
white  Gaussian  samples  of  unit  variance. 

5.  ADAPTIVE  ESTIMATION 

The  significance  test  of  the  previous  sections  depends  on  knowledge 
of  the  background  statistics;  i.e.,  we  need  to  know  or  estimate  the  co¬ 
efficients  of  the  assumed  space-varying  autoregressive  model.  However,  in 
attempting  this  estimation,  we  encounter  the  uncertainty  principle.  That 
is,  to  obtain  a  reliable  estimate  of  a(n,m;j,k),  we  require  stationarity 
over  a  "sufficiently  large"  window  size.  On  the  other  hand,  we  assume 
statistics  are  generally  changing  everywhere  in  space. 

To  side-step  this  problem,  we  assume  that  the  data  is  in  fact  sta¬ 
tionary  over  the  extent  of  what  we  shall  refer  to  as  the  estimation  win¬ 
dow,  we(n,m).  The  location  of  the  2-D  estimation  window  which  slides 
over  our  data  will  be  designated  by  the  center  index  (nQ,mg)  as  illus 


\ 
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trated  in  Figure  8.  The  model  parameters  associated  with  this  window  are 
defined  to  be  those  at  (ng.mo):  a(no ,mg ; j ,k) . 

The  particular  least  squares  estimation  procedure  we  shall  use  is  the 
covariance  method  of  linear  prediction  (9).  The  prediction  error  associ¬ 
ated  with  the  estimation  window  at  spatial  coordinates  (nQ.mg)  is  the 
error  e(ng,mo)  in  predicting  the  value  x(ng,mQ)  from  its  neighbors 
weighted  by  a(n0 ,m0 ; j ,k) .  Finally,  the  prediction  error  variance 
a2(no,mo)  is  given  by  the  average  squared  prediction  error  under  the 
estimation  window  at  (ng.mo),  based  on  the  coefficients  a(ng ,m0 ; j ,k) . 

For  each  pixel  location  (ng,mg),  we  wish  to  estimate  the  set  of  model 
parameters  a(n0  ,m0  ;k,Jl)  which  vary  is  space.  To  do  this,  we  assume  that 


i 


ESTIMATION  WINDOW 


We  (n-n0,  m-m0) 


Fig.  8.  Representation  of  the  estimation  window. 
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x(n,m)  is  a  2-D  random  field  stationary  over  each  we(n-nQ  .m-mg ) ,  and 
follows  the  model  (10),  but  where  the  parameters  a(n,m;j,k)  and  o(n,m)  are 
assumed  constant  with  respect  to  n  and  m  over  the  estimation  window. 
Therefore,  throughout  the  remainder  of  this  section,  we  drop  the  spacial 
dependence  and  work  with  the  model  given  by 
x(n,m)  -  l  l  a( j ,k)  x  (n-j ,m-k)  +  aw(n,m) 

(j,k)>(0,0) 

We  shall  assume  that  the  prediction  coefficients  a(j ,k)  fall  within  a 
(PxQ)  first-quadrant  plane  mask.  For  simplicity,  we  limit  our  derivations 
to  this  class  of  prediction  masks,  although  it  is  clearly  applicable  to 
more  general  mask  shapes,  such  as  other  quadrant  masks  and  non-causal 
masks.  Our  objective  is  to  estimate  from  x(n,m)  the  model  parameters 
a(j ,k)  for  j»0,l...P-l  and  k-0,l...Q-l,  with  j-k*0.  Further,  let's  sup¬ 
pose  that  we  have  available  x(n,m)  for  (n,m)e  l-P+nj  ,n2  ]x[-Q+mj  ,m2  ]  (see 
Figure  9).  We  then  define  the  error  e(n,m)  over  the  region  I,  given  by 

I*!0!  »0z  ] x >:b2  J  •  38 

P-1  Q-l 

e(n,m)  -  x(n,m)  -  £  £  a( j ,k)x(n-j ,m-k)  (n,m)el  (25) 

j-0  k-0 
(j,k)*(0,0) 

Our  goal  becomes  to  minimize  the  sum  of  the  squared  errors  given  by 

n2  ®2  2 

Elno  ,mo  ]  -  £  £  e  (n,m) 

n»ni  m«m^  (26) 

Note  that  the  region  I  over  which  e(n,m)  is  given  is  equivalent  to  the 
region  under  we(n-no  ,m-mo ) ,  but  that  in  determining  e(n,ra),  we  have  used 
some  data  along  the  border  of  we(n-np  .ra-rag). 
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n 


Fig.  9.  Known  data  blocks  used  in  2-D  least  squares. 


The  approach  we  take  is  to  tranfonn  the  2-D  problem  to  a  1-D  problem 
so  that  a  1-D  least  squares  solution  is  applicable.  Note,  however,  that 
we  will  still  have  solved  the  2-D  least  squares  problem.  In  particular, 
we  wish  to  transform  (26)  into  a  1-D  error  expression.  To  accomplish  this 
transformation,  we  define  the  vectors  a(nQ,m0)  and  a  by 


2  U 


where, 


Ar 


< -  PQ“1  - > 

[x(nl+j-0  .  .x(ni+j-0,mi+Q)  ]  . . .  [x^+j-P.m^-O)  .  .x^i+j-P.mj-Q)  ] 

[x(nl+j-0,ml+l-l)  .  .xCni+j-O.m^+l-t-Q)  ] . . .  [x(  n^j-P .mj+l-O)  .  .xCn^j-P.m^+l-Q] 
[x(n1+j-0,m2-l).  .x(n1+j-0,m2-Q)3  . . .  [x^+j-P  ,m2-0) .  .x^+j-P.n^-Q) ) 


(28b) 

and  where  we  have  assumed  the  data  segment  I  to  be  of  extent  MxM.  Note  that  o  Is 
a  vector  consisting  of  the  concatenation  of  the  rows  of  x(n,m)  over  I,  a[no,mo] 
is  a  vector  consisting  of  the  concatenation  of  the  rows  of  a( j ,k)  for 
( j  ,k)e  [0,P]x[0,Q]  with  (j ,k)*(0,0) ,  and  S  is  a  matrix  which  consists  of  the  con¬ 
catenation  of  rows  of  various  subsequences  of  the  known  x(n,m)  required  in  pre¬ 
dicting  each  value  of  x(n,m)  over  1. 

Therefore,  we  can  write  (26)  as: 


U2  ®2 

2 

Elno ,m0]  -  l  l  e  (n,m) 
n-nj  n»"m1 


-  (Sa[n0,m0)  -  a)  (Saln0,m0]  -  a) 


(29) 


We  then  write  the  solution  to  minimizing  (29)  with  respect  to  a[n0,m0]  as  [  9  ]  : 
-1  x 

a[n<)  »mol  "  R  So  (30a) 

where 


R 


T 
S  S 


(30b) 
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Note  that  the  matrix  R  is  of  extent  (PQ- 1 )x(PQ-l)  so  that  computation  re¬ 
quired  In  its  Inversion  Is  dependent  on  the  model  order. 

In  particular,  since  R  is  generally  not  Toeplitz,  Its  inversion  will 
require  on  the  order  of  (PQ-1)^  operations.  Thus  assuming  P,Q«M,  the 
bulk  of  the  computation  is  embedded  within  forming  R**STS  which  requires 
on  the  order  of  M^  operations. 

This  estimation  is  then  carried  out  at  each  pixel.  An  alternative  to 
this  direct  estimation  is  to  accomplish  the  estimation  recursively  (13). 
However,  this  appears  to  be  a  viable  alternative  only  when  the  estimation 
window  size  is  less  than  the  model  order  [14];  i.e.,  the  matrix  required 
to  be  inverted  at  each  pixel  is  on  the  order  of  MxM.  We  are  currently  in¬ 
vestigating  methods  to  reduce  this  computation. 

In  either  case,  we  obtain  a  parameter  set  at  each  pixel  which  repre¬ 
sents  an  estimate  of  the  model  parameters  of  the  changing  background,  re¬ 
quired  in  our  prediction  procedure.  Finally,  it  is  straightforward  to 
show  from  (29)  and  (30)  that  the  estimate  of  the  prediction  error  variance 
given  by  the  average  squared  prediction  error  under  each  estimation  window 
can  be  expressed  by 


a  (n0  ,m0  ) 


T  T  T  -2 

(a  a  -a[n0,m0]  (S  S)aln0,m0])N 


(31) 


IMPLEMENTATION  OF  THE  DETECTION  ALGORITHM 

We  are  now  ready  to  merge  the  results  of  the  previous  sections  to 
form  an  implementation  of  our  target  detection  algorithm.  From  our  coef¬ 
ficient  estimates  (30),  we  compute  the  prediction  error  function  e^(n,m) 
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based  on  a  fixed-order,  but  space-varying  prediction  model 


Then  with 


o^(n,m)  in  (31),  we  can  write  a  2-D  version  of  the  approximate  signifi¬ 
cance  test  of  (23),  as: 

1  £  e^(k,i)/o^(k,t)  >  constant  (32) 

k,ieS 

where  we  can  think  of  the  indices  k  and  l  as  running  over  different  re¬ 
gions  S  which  sweep  over  the  images. 

Equivalently,  we  can  consider  generating  the  statistic  in  (32)  at 
each  spatial  location  (n,m)  of  an  image  by  convolving  an  LxL“N  -  point 
smoothing  window,  w8(n,m)  (which  falls  over  our  region  S)  with  the  nor¬ 
malized  prediction  error  to  create  a  new  smoothed  function  Es(n,m): 

Es(n,m)  -  q(n,m)**ws(n,m)  (33a) 

where 

~  2  '9 

q(n,m)  ■*  e  (n,m)/o  (n,m)  (33b) 

In  the  estimation  of  the  model  parameters,  the  estimation  window 
we(n,m)  should  be  small  enough  to  preserve  approximate  stationary,  but 
large  enough  to  obtain  a  reliable  estimate  of  the  required  correlation  co¬ 
efficients.  The  estimation  window  must  also  be  large  enough  so  that 
anomalies  (l.e,  objects)  do  not  badly  corrupt  the  correlation  estimates. 

The  smoothing  window  should  be  small  enough  so  that  small-extent  ob¬ 
jects  are  not  overwhelmed  by  background  in  the  significance  test.  How¬ 
ever,  it  may  also  need  be  large  enough  so  boundary  effects  in  our  finite- 
order  model  assumption  do  not  play  a  significant  role. 
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The  overall  detection  algorithm  based  on  the  approximate  significance 
test  is  illustrated  in  Figure  10.  The  first  operation  subtracts  an  esti¬ 
mate  of  the  local  mean  of  x(n,m)  which  is  computed  by  averaging  x(n,m) 
under  we(n,m).  (Recall  that  our  significance  test  requires  a  zero-mean 
random  field.)  Under  the  estimation  window,  a  local  covariance  matrix  R 
as  defined  in  (30b)  is  computed.  R  is  then  used  to  find  the  coefficient 
vector  a[n,m]  and  the  prediction  error  variance  o^(n,m)  in  (30)  and 
(31),  respectively,  which  are  required  to  compute  the  normalized  predic¬ 
tion  error,  q(n,m).  Finally,  q(n,m)  is  convolved  with  the  smoothing  win¬ 
dow  w8(n,m)  and  compared  to  a  threshold. 


(UU'U) 


7.  EXAMPLES 


In  this  flection,  we  present  a  number  of  examples  based  on  the  detec¬ 
tion  algorithm  developed  in  the  previous  sections.  Throughout  this  sec¬ 
tion,  we  have  choosen  the  estimation  window  we  (n,m)  to  be  of  size  10x10 
pixels  which  we  assume  is  "sufficiently"  larger  than  the  size  of  most  ob¬ 
jects.  This  assumption  can  be  justified  through  our  empirical  observation 
that  in  most  cases  the  coefficient  change  function  (CCF)  [14]  of  our 
processed  Images  is  relatively  flat;  i.e.,  the  objects'  presence  appears 
to  not  adversely  affect  estimation  of  background  statistics.  We  also 
assume  a  10x10  window  is  large  enough  to  obtain  a  good  estimate  of  the 
correlation  coefficients  required  in  estimating  a[n,m]  and  o^(n,m),  but 
also  small  enough  to  maintain  approximate  stationarity .  Of  course,  how¬ 
ever,  this  assumption  breaks  down  at  region  boundaries. 

In  our  first  examples,  we  consider  computer  generated  1-D  and  2-D 
signals  determined  by  exciting  all-pole  filters  with  white  noise.  We  then 
analyze  progressively  more  complicated  real  images  which  we  have  obtained 
from  the  Rome  Air  Force  Development  Center  (RADC)  data  base. 
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EXAMPLE  1: 

Consider  a  sequence  x(n)  of  Che  form: 

x(n)  “  0.95  x(n  -  1)  +  w(n)  (34) 

where  w(n)  is  zero-mean  white  noise.  A  sample  function  of  x(n)  is  shown 
in  Figure  11. a  and  a  1-point  "object"  at  n  ■  64  is  shown  in  Figure  ll.b. 
The  (single)  coefficient  estimate  was  based  on  a  16-point  estimation 

'2 

window.  Figure  11. c  shows  the  squared  prediction  error  e^.  The  object 
is  clearly  detected. 

Consider  a  second  sequence  depicted  in  Figure  12. a  of  the  form  in 
(34)  created  with  a  different  white-noise  input.  A  four-point  object  has 
been  Implanted  at  locations  n  ■  90,  91,  92  and  93.  As  before  the  (single) 
coefficient  estimate  was  based  on  a  16-point  estimation  window.  The 
squared  prediction  error,  illustrated  in  Figure  12, b,  gives  a  clear  indi¬ 
cation  of  the  object. 
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Fig.  12.  Detection  of  A-polnt  object  in  Example  1:  (a)  1-D  random 

sequence  with  object;  (b)  squared  prediction  error. 


EXAMPLE  2: 


Figure  13. a  depicts  a  1-D  slice  of  an  aerial  photograph  consisting  of 
a  grove  of  trees.  In  Figure  13. b,  we  have  implanted  a  one-point  object  at 
n  «  64.  In  this  experiment,  a  six-parameter  non-causal  model  was  assumed: 

3 

x(n)  »  £  a(k)x(n-k)  +  w(n)  (35) 

k  -  -3 
k  *  0 

where  w(n)  is  white  noise.  The  estimation  window  was  choosen  at  ten 
points  in  duration.  The  squared  prediction  error,  shown  in  Figure  13. c, 
clearly  picks  out  the  object.  For  reference.  Figure  13. d  depicts  the 
squared  prediction  error  without  the  one-point  object.  A  second  object 
and  its  corresponding  squared  prediction  error  are  shown  in  Figures  14. a 
and  14. b,  respectively.  The  estimation  window  is  sixteen  points  in  dura¬ 
tion  and  a  two-point  noncausal  model  is  assumed. 
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Fig.  13.  Detection  of  1-point  object  in  Example  2:  (a)  slice  of 

trees;  (b)  slice  of  trees  with  object;  (c)  squared  prediction 
error  with  object;  (d)  squared  prediction  error  without  object. 


Fig.  14.  Detection  of  4-point  object  in  Example 
trees  with  object;  (b)  squared  prediction  error 


EXAMPLE  3: 


Consider  a  2-D  sequence  generated  by  the  particular  2-D  difference 
equation  of  the  form: 

x(n,m)  -  a(0, l)x(n-0,m-l )+a( 1 ,0)x(n-l ,m-0) 

+a(l ,l)x(n-l ,m-l)+w(n,m)  (36) 

The  background  sequence  (64x64  pixels  in  extent)  was  generated  with  co¬ 
efficients  a(0,l)“0.1,  a(l,0)»-0.9  and  a(l,l)-0.1.  Four  objects  were  im¬ 
planted  within  the  image,  all  of  a  constant  level,  but  with  a  variance 
about  equal  to  that  of  the  background.  Moreover,  the  size  and  level  of 
the  anomalies  were  choosen  to  be  visually  difficult  to  detect  from  the 
background  (see  Figure  15. a).  The  model  assumed  in  the  estimation  proce¬ 
dure  is  given  by  the  generating  process  (36). 

'  2 

The  3-D  perspective  of  the  squared  prediction  error  e  (n,m)  is 
given  in  Figure  16.  All  four  objects  are  clearly  detected  and  even  the 
two  closely  spaced  objects  are  resolved.  This  same  function,  along 
~  2 

with  the  smoothed  e  (n,ra)  (a  3x3  smoothing  window,  w8(n,m),  was  applied 
in  this  example)  are  illustrated  in  Figures  15. b  and  15. c  after  thres¬ 
holding.  In  Figure  15.d  is  shown  the  prediction  error  variance  and  in 
Figures  15. e  and  15. f  the  smoothed  normalized  prediction  error  —  both 
appropriately  thresholded. 

Note  that  two  different  thresholds  are  applied  to  the  smoothed  nor¬ 
malized  prediction  error.  The  first  resolves  three  of  the  four  objects  — 
the  second  resolves  all  four  objects,  but  introduces  false  alarms.  This 
is  due  to  the  inaccuracies  of  the  estimate  of  the  prediction  error  vari¬ 
ance  which  is  illustrated  in  Figure  15. d.  Ideally,  since  the  background 
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is  stationary,  the  estimated  prediction  error  variance  should  be  flat. 
However,  as  seen  in  Figure  15. d,  the  estimate  actually  peaks  in  the  region 
of  objects  —  contrary  to  what  we  would  hope  to  happen.  We  have  encoun¬ 
tered  in  this  synthetic  example,  perhaps,  what  is  a  fundamental  limitation 
in  measuring  the  background  prediction  error  variance:  the  presence  of 
objects  can  (falsely)  increase  the  background  residual  variance.  With 
aprlori  knowledge  that  the  background  prediction  error  variance  is  con¬ 
stant,  we  were  able  to  improve  detection. 
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Fig.  15.  Detection  of  objects  in  test  image  for  Example  3. 
(a)  test  image  with  four  objects,  (b)  prediction  error, 

(c)  smoothed  prediction  error,  (d)  prediction  error  variance, 

(e)  smoothed  normalized  prediction  error  (high  threshold), 

(f)  smoothed  normalized  prediction  error  (low  threshold). 
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EXAMPLE  4: 


Figure  17. a  depicts  a  64x64-pixel  RADC  image  in  which  two  2x2-pixel 
synthetic  objects  (of  constant  level)  have  been  Implanted.  This  image  was 
created  by  a  64-to-l  downsampling  and  smoothing  of  the  original  image. 
The  assumed  background  model  is  the  same  three-parameter  model  used  in  the 
previous  example  in  (36).  Figures  I7.b-17.e  depict  the  prediction  error, 
prediction  error  variance,  and  smoothed  normalized  prediction  error  (a  4x4 
smoother,  ws(n,m),  was  applied),  respectively.  The  processed  part  of 
the  image  is  given  within  the  boxed  area.  Note  that  normalization  of  the 
prediction  error  in  this  case  (unlike  the  previous  example)  has  helped  in 
bringing  out  the  object  from  the  more  busy  field  background. 
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Fig.  17.  Detection  of  objects  in  RADC  image  for  Example  4. 
(a)  image  with  two  synthetic  objects,  (b)  prediction  error, 
(c)  prediction  error  variance,  (d)  smoothed  normalized 
prediction  error  (high  threshold),  (e)  smoothed  normalized 
prediction  error  (low  threshold). 
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EXAMPLE  5: 


Figure  18. a  depicts  a  64x64-pixel  RADC  image  in  which  two  3x3-plxel 
synthetic  objects  (of  constant  level)  have  been  implanted.  This  field- 
tree  image  was  created  by  a  64-to-l  downsampling  and  smoothing  of  the 
original  image.  In  our  first  attempt  to  detect  the  two  synthetic  objects, 
the  three-parameter  model  of  (36)  was  assumed.  Although  the  object  in 
field  background  was  easily  detected,  the  object  in  tree  background  was 
not  detected,  even  with  normalization  by  the  prediction  error  variance. 

Consequently,  in  our  second  attempt  at  detection,  we  assumed  a 
twelve-parameter  non-symmetrlc  half-plane  [11]  autoregressive  model.  This 
model  is  more  general  and  thus  more  likely  to  accurately  model  the  back¬ 
ground  [11].  Figures  18.b-18.e  depict  the  prediction  error,  prediction 
error  variance,  and  smoothed  normalized  prediction  error  (a  4x4  ws(n,m) 
was  applied) ,  respectively.  Because  of  the  computational  intensity  with  a 
twelve-parameter  model,  only  the  designated  region  waB  processed.  Note 
that  normalization  of  the  prediction  error  has  helped  significantly  in 
bringing  out  from  the  background  the  object  embedded  within  the  trees. 


Fig.  18.  Detection  of  objects  in  RADC  image  for  Example  5 
(a)  image  with  two  synthetic  objects,  (b)  prediction  error 
(c)  prediction  error  variance,  (d)  smoothed  normalized 
prediction  error  (high  threshold),  (e)  smoothed  normalized 
prediction  error  (low  threshold). 
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EXAMPLE  6: 


Consider  the  RADC  image  displayed  in  Figure  19. a.  This  image  con¬ 
sists  of  128x128  pixels  and  was  created  by  a  16-to-l  downsampling  and 
smoothing  of  the  original  image.  The  assumed  background  model  is  the  same 
three-parameter  model  used  in  example  3  in  (36).  Figures  19.b-19.d  also 
depicts  the  prediction  error,  smoothed  prediction  error  and  smoothed 
normalized  prediction  error  —  suitably  thresholded.  As  in  our  synthetic 
example,  the  smoothed  prediction  error  without  normalization  yields  fewer 
detected  objects  (which  may  or  may  not  be  considered  false  alarms)  than 
the  smoothed  normalized  prediction  error.  This  happens  probably  because 
the  background  variance  appears  reasonably  constant  throughout  the  image. 
The  objects,  however,  can  potentially  introduce  a  false  Increase  in  the 
local  variance,  as  illustrated  in  Figure  19. e  which  shows  the  thresholded 
prediction  error  variance. 
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Fig.  19.  Comparison  of  smoothed  prediction  error  and  smoothed 
normalized  prediction  error  for  Example  6.  (a)  RADC  image, 

(b)  smoothed  normalized  prediction  error,  (c)  prediction  error, 
(d)  smoothed  prediction  error,  (e)  prediction  error  variance. 


EXAMPLE  7: 


Consider  the  RADC  linage  displayed  In  Figure  20. a.  This  image  con¬ 
sists  of  128x128  pixels  and  was  created  by  a  16-to-l  downsampling  and 
smoothing  of  the  original  image.  As  in  the  previous  example,  a  three- 
parameter  autoregressive  model  is  assumed.  Figure  20  makes  the  same 
comparisons  among  the  various  residuals  as  made  in  Example  6. 
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Fig.  20.  Comparison  of  smoothed  prediction  error  and  smoothed 
normalized  prediction  error  of  Example  7.  (a)  RADC  image, 

(b)  smoothed  normalized  prediction  error,  (c )  prediction  error 
(d)  smoothed  prediction  error,  (e)  prediction  error  variance. 


EXAMPLE  8: 


Consider  the  RADC  image  displayed  In  Figure  21. a.  This  image  con¬ 
sists  of  128x128  pixels  and  was  created  by  an  64-to-l  downsampling  and 
smoothing  of  the  original  image.  In  this  example ,  we  consider  a  first- 
quadrant  causal,  second-quadrant  causal  autoregressive  model,  and  average 
of  the  two.  This  average  represents  an  attempt  to  do  away  with  direction¬ 
ality  of  the  approximate  significance  test. 

Figures  21  and  22  illustrate  the  results  with  first-quadrant  (three- 
parameter)  and  second-quadrant  (three-parameter)  prediction  masks,  re¬ 
spectively.  Figure  23  summarizes  our  results  by  depicting  the  smoothed 
normalized  prediction  errors  and  their  average.  Note  that  the  individual 
smoothed  normalized  prediction  errors  do  well  in  detecting  most  objects, 
while  the  average  appears  to  deteriorate  the  performance. 

Two  additional  experiments  that  were  performed  with  this  data  are 
shown  in  Figures  24  and  25.  In  Figure  24,  we  show  a  thresholded  version 
of  the  coefficient  change  function  (CCF)  [14]  corresponding  to  the  predic¬ 
tor  of  Figure  21.  The  CCF  bears  little  resemblance  to  our  prediction 
errors.  Moreover,  due  to  the  large  estimation  window  (i.e.,  10x10 
pixels),  this  function  is  small  everywhere  —  reflecting  little  sample-to- 
sample  change  in  the  coefficient  estimates.  Finally,  in  Figure  25,  we 
depict  the  smoothed  noncausal  normalized  prediction  error.  The  noncausal 
prediction  mask  is  an  eight-point  nearest  neighbor  mask.  The  results  on 
this  image  and  others  (e.g.,  example  6)  are  encouraging,  but  appear  to  do 
no  better  (and  perhaps  worse)  than  the  causal  prediction  masks. 
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Fig.  21.  Comparison  of  prediction  error  and  smoothed  normalized 
prediction  error  with  first-quadrant  mask  for  Example  8.  (a)  RADC 

image,  (b)  prediction  error  variance,  (c)  prediction  error,  (d)  smoothed 
normalized  prediction  error. 
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Fig.  22.  Comparison  of  prediction  error  and  smoothed  normalized 
prediction  error  with  second-quadrant  mask  for  Example  8.  (a)  RADC 

image,  (b)  prediction  error  variance,  (c)  prediction  error, 

(d)  smoothed  normalized  prediction  error. 
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Fig.  23.  Average  of  smoothed  normalized  prediction  errors 
for  Example  8.  (a)  RADC  image,  (b)  smoothed  normalized 

prediction  error  (1st  quad),  (c)  smoothed  normalized 
prediction  error  (2nd  quad),  (d)  average  of  (b)  and  (c). 
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EXAMPLE  9: 


Consider  the  64x64  pixel  RADC  image  in  Figure  26  generated  by  down- 
sampling  the  original  image  by  16-to-l  with  smoothing.  This  image  is 
particularly  interesting  due  to  the  presence  of  a  radio  tower  in  the  lower 
right-hand  corner  of  the  image.  Note  that  the  top  of  the  radio  tower  has 
been  clearly  detected. 


It  is  interesting  to  observe  that  in  examples  7,  8,  and  9  normaliza¬ 
tion  of  the  prediction  errors  helped  detection  and  reduced  false  alarms  by 
reducing  the  background  variance  in  busy  regions  such  as  the  tree  and 
brush  areas. 
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Fig.  26.  Comparison  of  prediction  error  and  smoothed  normalized 
prediction  error  for  Example  9.  (a)  RADC  image,  (b)  prediction 

error  variance,  (c)  prediction  error,  (d)  smoothed  normalized 
predict  ion  error. 
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8.  COMMENTS 


The  examples  of  the  previous  section  clearly  illustrate  the  success 
of  the  2-D  prediction  residual  In  object  detection.  However,  there  remain 
many  unanswered  questions.  For  example,  we  have  not  In  this  report  Imple¬ 
mented  the  true  significance  test.  The  difficulties  involved  In  estima¬ 
ting  a  large  covariance  matrix  makes  this  undesirable.  Note  also  that  the 
true  test  Involves  a  "one-shot"  approach  to  the  problem.  A  covariance 
matrix  is  estimated  and  used  In  the  thresholding  operation.  If  the  esti¬ 
mate  is  bad  —  as  it  may  be  in  object  regions  —  we  have  no  chance  to 
twiddle  parameters  interactively.  The  prediction  approach,  on  the  other 
hand,  has  taken  the  true  test  apart  into  a  number  of  components,  allowing 
individual  twiddling  of  prediction  parameters,  normalization,  etc.  From 
this  viewpoint.  It  may  result  that  the  true  test  Is  not  a  good  standard 
and  the  approximations  and  their  various  modifications  may  yield  better 
results. 

Another  unresolved  area  Involves  combining  various  approximations 
such  as  1st  and  2nd  quadrant  predictors  to  approach  the  true  test.  The 
few  preliminary  experiments  with  such  combinations  have  not  yielded  con¬ 
sistent  results.  Nevertheless,  there  may  exist  some  theoretical  justi¬ 
fication  for  combining  different  quadrant  predictors.  As  we  saw  in  sec¬ 
tion  3.3,  by  simply  ordering  samples  In  our  region  S  In  different  ways,  we 
obtain  different  Implementations  of  the  significance  test.  These  imple¬ 
mentations  Involve  different  directionalities,  e.g.,  1st  or  2nd-quadrant 
growing  prediction  masks.  Often  the  "bad”  samples  (l.e.,  samples  not 
yielding  prediction  errors  of  the  true  test)  of  one  Implementation  are  the 
"good"  samples  of  another.  Consequently,  it  seems  reasonable  that  there 
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exists  a  way  to  combine  these  different  predictors  to  (theoretically) 
approach  the  true  significance  test. 

Furthermore,  since  the  significance  test  can  be  implemented  with  a 
(growing)  predictor  of  any  directionality,  there  does  not  exist  inherent 
directionality  within  the  test.  The  approximations,  however,  impose  their 
own  specific  directionality.  Recall,  however,  that  the  approximation  can 
be  made  safely  only  when  the  background  has  associated  with  it  certain 
directionality  (e.g.,  lst-quadrant  or  2nd-quadrant  models).  Therefore, 
the  validity  of  the  directionality  imposed  by  the  approximation  is  direct¬ 
ly  linked  to  the  assumption  that  the  background  has  some  sort  of  direc¬ 
tionality  associated  with  it  —  perhaps,  an  unreasonable  assumption. 

In  section  5,  we  only  touched  upon  the  estimation  problem.  Clearly, 
we  might  seek  better  estimates  of  the  model  parameters  and  of  background 
variance  which  is  used  in  normalisation.  An  iterative  technique  is  one 
possibility.  On  each  iteration,  we  might  estimate  background  statistics 
from  pixels  which  do  not  include  current  object  samples.  Alternatively, 
we  might  fill  in  what  we  think  are  object  areas  with  a  signal  predicted 
from  the  background.  The  former  case  raises  questions  about  estimation 
with  missing  data. 

An  interesting  characteristic  of  this  detection  algorithm  is  that 
lines  and  edges  of  regions  tend  to  be  suppressed,  while  anomalous  areas 
are  enhanced.  It  is  also  at  lines  and  region  boundaries  where  our  mod¬ 
eling  assumptions  break  down.  A  better  understanding  is  needed  of  the 
response  of  the  prediction  error  in  such  areas.  For  example,  an  "optimal'' 
size  of  the  region  S  may  reduce  the  probability  of  false  alarm  in  these 
regions  and  increase  detections  in  anomalous  areas. 
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Finally,  we  might  consider  introducing  some  general  apriori  knowledge 
about  the  objects  of  interest.  A  procedure  for  introducing  such  knowledge 
in  a  significance  test  remains  to  be  understood.  One  possible  approach 
might  involve  developing  a  "cross"  between  significance  and  hypothesis 
testing. 
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